Keratin 17 and Collagen type 1 genes: Esophageal cancer molecular marker discovery and evaluation

ABSTRACT One hundred eighty pairs of tissues of esophageal squamous cell carcinoma (ESCC) were tested by the transcriptome sequencing in order to explore etiology factors. The chi‐square test and correlation analysis demonstrated that the relative expression levels of keratin 17 (KRT17) and collagen type I α1 chain (COL1A1) were significantly higher in EC with diabetes. Expression of KRT17 was correlated with blood glucose (r = 0.204, p = 0.001) and tumor size (r = −0.177, p = 0.038) in patients. COL1A1 correlated with age (r = −0.170, p = 0.029) and blood glucose levels (r = 0.190, p = 0.015). Experimental results of qRT‐PCR: KRT17 and COL1A1 genes were highly expressed in ESCC (p < 0.05). When the two genes were used as a combination test, the positive detection rate of EC was 90.6%, and the ROC curve had greater power. The KRT17 and COL1A1 genes had the potential to be biomarkers for the diagnosis of ESCC.


| INTRODUCTION
Esophageal cancer (EC) is the eighth most common cancer and the sixth most common cause of cancer death worldwide. 1According to the type of malignant cells, EC has two different pathological types: adenocarcinoma and squamous cell carcinoma. 2According to the World Health Organization report in 2022, the incidence of squamous cell carcinoma is most severe in Asia, especially in China, accounting for more than half of EC cases worldwide (Figure 1).Esophageal squamous cell carcinoma (ESCC) accounts for approximately 90% of Abbreviations: ESCC, esophageal squamous cell carcinoma; EAC, esophageal adenocarcinoma; WGS, whole genome sequencing; COL1A1, collagen type I α1 chain; KRT17, keratin 17; PCR, polymerase chain reaction; OR, odds ratio.cases worldwide and is mainly esophageal carcinoma in China. 3,4 he early clinical symptoms of EC are not obvious, patients in the advanced stage are even unable to swallow water with a poor prognosis 5-year survival rate less than 20%.Therefore, it is urgent to explore the molecular mechanisms and find screening therapeutic targets and prognostic markers. 5he treatment methods for EC mainly include surgery, radiotherapy, chemotherapy, 5 targeted therapy, 6,7 and immunotherapy 8 clinically, but with great trauma and poor recovery.Studies have shown that long-term smoking, 9 alcohol consumption, 10,11 blanching diet and the contents of some vitamins, 12,13 minerals, trace elements, 14,15 mildewed food, dietary patterns 16 are high-risk factors. 17,18 enetic studies still lack specific screening indicators, although studies have preliminarily shown that genes are associated with EC.For example, Li-Yan Guo 19 showed that the PLCE1 gene interacted with environmental factors and that the occurrence of ESCC and rs227423, a polymorphism of the PLCE1 gene, played a role in the ESCC population with family history.Cheng 20 tested the feasibility of epidermal growth factor receptor (EGFR) gene-targeted chimeric antigen receptor T-cell immunotherapy (CAR-T) cells for the treatment of ESCC in a large Chinese sample by whole genome sequencing (WGS) and RNA-seq analysis.A circRNA study 21 found that the cirbe4b-mitogen-activated protein kinases/extracellular signal-regulated kinase (MAPK/ ERK) signaling pathway could potentially be a biological target for ESCC therapy.
The collagen type I α1 chain (COL1A1) gene belongs to the collagen family and is involved in epithelialmesenchymal transition (EMT).Studies 22,23 have shown that diseases associated with COL1A1 also include Caffey's disease and type I osteogenesis imperfecta disease.
[28][29][30] The COL1A1 gene and keratin 17 (KRT17) network pathway played a role (Figure 2).KRT17 is predominantly expressed in the appendages of epidermal tissues.EC is a kind of epithelial tumor.2][33] However, no study has been conducted on the combined effect of COL1A1 and KRT17 in the Chinese ESCC population.Based on this, this study was performed.

| Specimen and clinical information collection
The study had the informed consent of the participants.All the experiments were in accordance with the Declaration of Helsinki.This study was approved by the Ethics Committee of the Affiliated People's Hospital of Jiangsu University.
Between 2018 and 2020, 180 pairs of ESCC surgical pathological tissues were collected from the Affiliated People's Hospital of Jiangsu University.The cancer tissues were the case group, and the adjacent normal tissues were the control group.The surgical tissue was immediately placed in a cryotube with an RNA tissue preservation solution and stored in liquid nitrogen.We collected personal data from questionnaires, medical records, and interviews and recorded the required information, such as smoking, drinking, blood sugar, and other indicators, in Excel tables.Clinical information includes recording gender, age, smoking and drinking habits, fasting blood glucose, past medical history of hypertension and diabetes, histological tissue differentiation degree, and tumor size.

| Exclusion criteria
Request is primary ESCC in Chinese population.Non-ESCC were excluded.Exclude patients with other cancers that have metastasized to EC. Patients who had undergone treatments prior to surgery were excluded.

| Extraction of total RNA
A total of 180 pairs of RNA were extracted by the TRIzol method, and genomic DNA was removed by DNase I.The quality of RNA samples was detected by 2100 Bioanalyser and ND-2000 methods to ensure that qualified samples (OD260/280 = 1.8-2.2,OD260/230 ≥ 2.0, RNA integrity number [RIN] ≥ 6.5, 28S: 18S ≥ 1.0, > 5 μg) were used for transcriptome sequencing.

| Transcriptome sequencing
Raw data stored in fastq format were obtained using the Illumina Hiseq sequencing platform.First, fastp was used to evaluate the sequencing-related quality of the original sequencing data, and the base quality, base error rate, and base content distribution were analyzed to obtain high-quality quality control data (clean data), ensuring the accuracy of the results.Subsequently, the high-quality de-adaptor reads data were compared with the human GRCh38.p13reference genome using the improved BWT algorithm (http://asia.ensembl.org/Homo_sapiens/Info/Index).Finally, the software StringTie (http://ccb.jhu.edu/software/stringtie/) was used to assemble the mapped reads by reference-based assembly.

| Screening for differentially expressed genes
After obtaining the read counts for genes/transcripts, DESeq2 was employed to perform differential gene expression analysis between multiple samples.This analysis identified differentially expressed transcripts.To obtain significantly different genes, the filtering conditions were set as follows: p-value < 0.05 and jFoldChangej > 2.

| Q-PCR validation of differentially expressed genes
The selected differentially expressed genes, KRT17 and COL1A1, were selected for large sample validation, and 180 pairs of tissues from patients were used, and there was no significant difference in age and gender between the two groups with comparability.Total RNA was extracted from the samples by the TRIzol method, and then the extracted total RNA was reverse transcribed into cDNA using HiScript III RT SuperMix for qPCR (+gDNA wiper), a reverse transcription kit produced by Vazyme, China, and reverse transcribed on a PCR instrument according to the following reaction conditions: Reaction system and conditions: RNase-free ddH2O, 16 μL, 4 Â gDNA wiper Mix, 4 μL, Template RNA 1 μg, 4 μL 5 Â HiScript III qRT SuperMix, 16 μL 1st reaction solution reaction condition: 42 C, 2 min.Reaction conditions: 37 C for 15 min, 85 C for 5 s, À20 C for short-term storage, À70 C for long-term storage.
Primer design for the KRT17 and COL1A1 genes was carried out using Primer 5.0 software, as shown in Table 1, and detailed specificity prediction was performed using Primer-BLAST software.The primers were synthesized by Shanghai Biotechnology Co., Ltd.

| Real-time fluorescence quantitative PCR detection of KRT17 and COL1A1 gene expression
Quantitative gene expression analysis was performed using the AceQ Universal SYBR qPCR Master Mix fluorescence quantitative reagent kit from Vazyme.The cDNA concentration was measured and diluted using Nanodrop 2000, and the diluted cDNA (100 ng/μL) was used as the template for RT-PCR.The reaction system and program are as follows (Tables 2 and 3): Quantitative expression of genes was detected using AceQ Universal SYBR qPCR Master Mix, a fluorescence quantitative kit for Vazyme.The cDNA concentration was measured and diluted using Nanodrop 2000, and the diluted cDNA (100 ng/μL) was used as the template for RT-PCR.Each RNA template was tested in triplicate.The reaction system and reaction procedure are as follows (Tables 2 and 3).

| Statistical methods
The result data were visualized as bar graphs using Graph-Pad Prism 8 software, represented as geometric means ± standard deviation.Statistical analysis was performed using SPSS 23.0 software, and the differences between the two were assessed using a t-test.The chi-square test and Bonferroni test were from SPSS 20.0 software.

| Chi-square test for the relative expression levels of KRT17 and COL1A1
Bold diabetes in Table 4: in the KRT17-RQ lower group, ESCC had diabetes mellitus of 1.61% (number = 1); however, they had diabetes mellitus of 10.26% (number = 12) in the KRT17-RQ higher group.The difference was statistically significant (p = 0.036) and the Bonferroni test was 0.047.

| Relative expression of KRT17 and COL1A1 and quantitative indicators of clinical factors 3.3 | Transcriptome sequencing results
The relative expression of COL1A1 mRNA was correlated with the age of patients (r = À0.170,p = 0.029, see T A B L E 1 Primers were designed for KRT17 and COL1A1 genes.

Gene
Primer Sequence Number Figure 3A).The relative expression level of COL1A1 mRNA was positively correlated with the blood glucose of patients (r = 0.191, p = 0.015, see Figure 3B).The relative expression level of KRT17 mRNA was positively correlated with the blood glucose of patients (r = 0.204, p = 0.001, see Figure 3C).KRT17 mRNA was positively correlated with the size of the tumor (r = 0.204, p = 0.001, see Figure 3D).

| Quality testing of transcriptome sequencing samples
Figure 4, nucleic acid running gel was performed on the extracted RNA experiment, and its RIN value and optical density (OD) ratio were determined, showing that the sequenced RNA was of good quality.

| Expression of transcriptome sequencing differential genes KRT17 and COL1A1 in tumor and normal tissues
Figure 5, KRT17 and COL1A1 genes were selected as the highly expressed significantly differential genes, which are closely related to ESCC.KRT17 and COL1A1 were screened based on expression level, fold difference, p < 0.05, and enrichment analysis.

| Differential expression of KRT17 and COL1A1 genes in tumor and normal tissues detected by real-time PCR
Individual real-time fluorescence quantification showed that KRT17 gene expression levels were significantly increased in tumor tissues from 145/180 patients with EC, and COL1A1 gene expression levels were significantly increased in tumor tissues from 152/180 patients.Results are shown in Figure 5A,B; the relative expression levels of the KRT17 and COL1A1 genes were significantly higher in tumors than normal, and the difference was statistically significant (p < 0.05) (Figure 6).

| ROC curve analysis
To assess the potential power of KRT17 and COL1A1 as diagnostic markers for EC, receiver operating characteristic curves (ROC) were used for analysis.
As shown in Figure 7A-C, the area under curve (AUC) values of KRT17 and COL1A1 for EC were found to be 0.751, 0.845, and 0.844, respectively.The AUC values of all three were ≥0.7 and when COL1A1 and KRT17 were used in combination ≥0.8, which had some power in the test.
The area under the ROC curve was greater than 0.5 (AUC > 0.5), representing that these two genes alone or in combination have reference values for detection efficacy (Table 5).
Figure 8, alone COL1A1 positive was found is 152/180 in EC patients, with a detection rate of 84.4%; alone KRT17 positive was found is 113/180 in esophageal patients, with a detection rate of 62.8%; and 163/180 positive patients were found when combined use of these two genes (COL1A1 + KRT17, the same time), with a high rate of 90.6%, which could better reduce the possibility of detection errors caused by the use of the two alone.

| DISCUSSION
COL1A1 is highly expressed in a variety of cancers and regulates cellular processes, and high COL1A1 expression is also associated with poor cancer prognosis. 5A study  based on animal observations in mice and computer imaging systems from the University of Michigan and other institutions has found that when researchers eliminate COL1A1 production from tumor cells, animal models with brain tumors may live longer, 28 which showed COL1A1 as a feasible target to disrupt tumor progression in the latest spatiotemporal analysis study of glioma heterogeneity.
Another study from Huang et al. 34 researchers detected the expression of FBXL19-AS1, miR-193a-5p, and COL1A1 by RT-PCR and western blot and determined the proliferation, migration, invasion, apoptosis, and EMT of cervical cancer cells.The results showed that high COL1A1 expression was associated with the proliferation and metastasis of cervical cancer.FBXL19-AS1 promotes the proliferation and metastasis of cervical cancer cells by sponging miR-193a-5p and up-regulating COL1A1.
Meanwhile, KRT17 played a role in DNA damage response and tumor initiation. 8,9High levels of KRT17 are associated with poor prognosis in several human cancers.Raji R. 31 experimented through mouse studies and cell comparison research showed that KRT17 high expression is positively significant with skin cancer growth between the test group and control, a lack of KRT17 delays tumor proliferation, and high levels of KRT17 expression are not conducive to the prognosis of the disease.
The similarities between this study and many of the above studies suggest that these two genes are expressed at high levels in ESCC.Although both KRT17 and COL1A1 expression levels are highly expressed in ESCC tissues, they vary according to the specific population.Relative expression of KRT17 and COL1A1 was both higher in ESCC patients with diabetes and correlated with blood glucose levels.In a nut word, those ESCC patients with high blood glucose values had higher KRT17 and COL1A1 expression levels.Research suggests that there may be an intrinsic link between high and low sugar levels in ESCC.When it comes to sugar and cancer research, researchers Hedong Han et al. 35 explored the relationship between blood glucose concentration and the risk of liver cancer.They conducted a meta-analysis of prospective studies by using a database of PubMed, EMBASE, and the Cochrane Library through October 2016 with a total of 1975 liver cancer cases.This research revealed that blood glucose increases the risk of liver cancer across the range of prediabetes and diabetes.In the next research plan, we will demonstrate the relationship between sugar and EC through meta-analysis.
On the other hand, it was negatively correlated with tumor size; that is, the smaller the tumor, the higher the KRT17 expression level; otherwise, the KRT17 expression level decreased as the tumor grew larger.It suggests that this gene is significant in the early stages of a tumor, and the KRT17 screening test should be performed early when the tumor is small so that it can be detected as soon as possible.
COL1A1 expression is negatively correlated with age (lower age corresponds to higher COL1A1 expression in patients) and positively correlated with blood glucose level (higher blood glucose, higher COL1A1 expression).All these suggest that targeted screening for COL1A1 expression should be focused on individuals of younger age (<66 years), and at the same time, those esophageal tumor patients with hyperglycemia or diabetes should be strengthened for COL1A1 gene screening.
Even so, we have to admit that there are several limitations to this research.The primary limitation of this study is the relatively small sample size, especially for the controls and ESCC group.However, this study undoubtedly shows a significant association between the KRT17 and COL1A1 genes, of course, this association need a larger ESCC sample.The findings of this single-center study would be strengthened by corroboration from other larger studies.The other differences in cell experiments were not included in the current study, and their inclusion in future research may expand the role of these two genes in ESCC and prognosis.
Our study is still ongoing with the postoperative follow-up of EC patients.Survival indicators for 5-year prognoses will be collected.We will conduct in-depth analyses, considering the expression patterns of the KRT17 gene and COL1A1, to assess their impact on both the prognosis of patients survival time and quality of life.Targeted follow-up studies on KRT17 and COL1A1 genes with ESCC survival may help not only elucidate the mechanisms that relate the two genes to tumor initiation and, more broadly, the roles of KRT17 and COL1A1 during tumorigenesis but also understand how these elements play out in different populations of ESCC and other tissues.

F I G U R E 1
Incidence profile of esophageal squamous cell carcinoma worldwide during 2022 from World Health Organization (WHO).https://www.cancer.gov/types/esophageal.

F I G U R E 2
Collagen type I α1 chain (COL1A1) or keratin 17 (KRT17) gene STRING Interaction Network Preview.(A) COL1A1 gene.STRING Interaction Network Preview; (B) KRT17 gene STRING Interaction Network Preview.From Genecards online network.

F I G U R E 4
Quality detection map of transcriptome sequencing samples.A, B, C: tumor; D, E, F: control.F I G U R E 3 Correlation analysis between keratin 17 (KRT17) and collagen type I α1 chain (COL1A1) genes relative expression and clinical factors.

F I G U R E 5
Expression levels of differential genes keratin 17 (KRT17) and collagen type I α1 chain (COL1A1) in tumor and control groups.CA, cancer group; NO, control group; TPM, transcripts per million.F I G U R E 6 qRT-PCR detecting the difference in gene expression between the two groups.Vertical axis: relative transcript level À relative expression level.Horizontal axis: sample groups (normal group and cancer tissue group).** is p < 0.01; **** is p < 0.001.F I G U R E 7 Receiver operating characteristic curves (ROC) for keratin 17 (KRT17) and collagen type I α1 chain (COL1A1).(A) KRT17 individual ROC curve profile plot.(B) Characteristic plot of COL1A1 separate ROC curve.(C) ROC curve characteristic plot for combined KRT17 + COL1A1 test.T A B L E 5 Core genes KRT17 and COL1A1 were used to test the efficacy of esophageal cancer.
T A B L E 4 Chi-square test of clinical factors for KRT17 and COL1A1 relative gene expression and dichotomous data.